NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Dynamic Regret Analysis of Safe Distributed Online Optimization for Convex and Non-convex Problems

Chang, Ting-Jui; Chaudhary, Sapana; Kalathil, Dileep; Shahrampour, Shahin (October 2023, Transactions on Machine Learning Research)

Full Text Available
Enhanced Meta Reinforcement Learning via Demonstrations in Sparse Reward Environments

Rengarajan, Desik; Chaudhary, Sapana; Kim, Jaewon; Kalathil, Dileep; Shakkottai, Srinivas (December 2022, Advances in neural information processing systems)

Full Text Available
Enhanced Meta Reinforcement Learning using Demonstrations in Sparse Reward Environments

Rengarajan, Desik; Chaudhary, Sapana; Jaewon, Kim; Kalathil, Dileep; Shakkottai, Srinivas (November 2022, Advances in neural information processing systems)

Meta reinforcement learning (Meta-RL) is an approach wherein the experience gained from solving a variety of tasks is distilled into a meta-policy. The metapolicy, when adapted over only a small (or just a single) number of steps, is able to perform near-optimally on a new, related task. However, a major challenge to adopting this approach to solve real-world problems is that they are often associated with sparse reward functions that only indicate whether a task is completed partially or fully. We consider the situation where some data, possibly generated by a suboptimal agent, is available for each task. We then develop a class of algorithms entitled Enhanced Meta-RL using Demonstrations (EMRLD) that exploit this information—even if sub-optimal—to obtain guidance during training. We show how EMRLD jointly utilizes RL and supervised learning over the offline data to generate a meta-policy that demonstrates monotone performance improvements. We also develop a warm started variant called EMRLD-WS that is particularly efficient for sub-optimal demonstration data. Finally, we show that our EMRLD algorithms significantly outperform existing approaches in a variety of sparse reward environments, including that of a mobile robot.
more » « less
Full Text Available
Safe Online Convex Optimization with Unknown Linear Safety Constraints

Chaudhary Sapana; Kalathil, Dileep (February 2022, AAAI Conference on Artificial Intelligence (AAAI))

We study the problem of safe online convex optimization, where the action at each time step must satisfy a set of linear safety constraints. The goal is to select a sequence of ac- tions to minimize the regret without violating the safety constraints at any time step (with high probability). The parameters that specify the linear safety constraints are unknown to the algorithm. The algorithm has access to only the noisy observations of constraints for the chosen actions. We pro- pose an algorithm, called the Safe Online Projected Gradient Descent(SO-PGD) algorithm to address this problem. We show that, under the assumption of the availability of a safe baseline action, the SO-PGD algorithm achieves a regret O(T^2/3). While there are many algorithms for online convex optimization (OCO) problems with safety constraints avail- able in the literature, they allow constraint violations during learning/optimization, and the focus has been on characterizing the cumulative constraint violations. To the best of our knowledge, ours is the first work that provides an algorithm with provable guarantees on the regret, without violating the linear safety constraints (with high probability) at any time step.
more » « less
Full Text Available

Search for: All records